Different radiomics annotation methods comparison in rectal cancer characterisation and prognosis prediction: a two-centre study

Objectives To explore the performance differences of multiple annotations in radiomics analysis and provide a reference for tumour annotation in large-scale medical image analysis. Methods A total of 342 patients from two centres who underwent radical resection for rectal cancer were retrospectively studied and divided into training, internal validation, and external validation cohorts. Three predictive tasks of tumour T-stage (pT), lymph node metastasis (pLNM), and disease-free survival (pDFS) were performed. Twelve radiomics models were constructed using Lasso-Logistic or Lasso-Cox to evaluate and four annotation methods, 2D detailed annotation along tumour boundaries (2D), 3D detailed annotation along tumour boundaries (3D), 2D bounding box (2DBB), and 3D bounding box (3DBB) on T2-weighted images, were compared. Radiomics models were used to establish combined models incorporating clinical risk factors. The DeLong test was performed to compare the performance of models using the receiver operating characteristic curves. Results For radiomics models, the area under the curve values ranged from 0.627 (0.518–0.728) to 0.811 (0.705–0.917) in the internal validation cohort and from 0.619 (0.469–0.754) to 0.824 (0.689–0.918) in the external validation cohort. Most radiomics models based on four annotations did not differ significantly, except between the 3D and 3DBB models for pLNM (p = 0.0188) in the internal validation cohort. For combined models, only the 2D model significantly differed from the 2DBB (p = 0.0372) and 3D models (p = 0.0380) for pDFS. Conclusion Radiomics and combined models constructed with 2D and bounding box annotations showed comparable performances to those with 3D and detailed annotations along tumour boundaries in rectal cancer characterisation and prognosis prediction. Critical relevance statement For quantitative analysis of radiological images, the selection of 2D maximum tumour area or bounding box annotation is as representative and easy to operate as 3D whole tumour or detailed annotations along tumour boundaries. Key Points There is currently a lack of discussion on whether different annotation efforts in radiomics are predictively representative. No significant differences were observed in radiomics and combined models regardless of the annotations (2D, 3D, detailed, or bounding box). Prioritise selecting the more time and effort-saving 2D maximum area bounding box annotation. Graphical Abstract


Method S2 Description of the LASSO method
Lasso-Logistic regression model was used in this study for the prediction of pT and pLNM.The "glmnet" function was used to fit the logistic regression model, with the "family" parameter set to "binomial" to implement binary logistic regression.The regularization parameter "alpha" was set to 1 to ensure the model used L1 regularization.Cross-validation was then performed using the "cv.glmnet" function to determine the optimal regularization parameter "lambda.min".The non-zero coefficients in the model were extracted, and the variables corresponding to these coefficients were selected as key features of the model.To quantify the predictive power of the model, a custom function, "myFun", was defined.This function takes the feature vector "x" and the model coefficients "actCoef" as inputs and calculates their dot product.The result of the dot product serves as the rad-score value for the binary classification model, which is the probability of predicting the positive class.Using the feature vectors obtained from the training set and their coefficient values, the radscore values were calculated for each sample in the internal and independent external validation cohorts.Lasso-Cox was used for the prediction of pDFS in this study.Survival time "futime" and survival state "fustat" association were used as model inputs to fit the cox proportional hazards model using the "glmnet" function with the "family" parameter set to "cox" and the alpha parameter set to 1 for L1 regularization.Crossvalidation was performed by "cv.glmnet", and we selected the optimal "lambda.min".According to the results of cross-validation, we extracted the non-zero coefficients in the model, and the variables corresponding to these coefficients formed the feature set of the model.Using the "myFun" function, we scored the features in the training and test sets and computed the dot product of the feature vectors "x" and the model coefficients

Result S1: The features selected for modelling
We extracted 1135 features from each of the four types of ROIs using the open-source software PyRadiomics.These included: diagnostic features (n=3), first-order statistical features (n=234), shape features (n=14), and texture features (n=884).The specific process of feature screening is as follows: Insights Imaging (2024) Zhu Y, Wei Y, Chen Z, et al.
(1) With ICC>75% as a requirement, 101 cases (8.9%) with low robustness features were removed from the 3D annotation, while 1034 cases (91.1%) with high robustness features were retained.The aforementioned culled feature types will be applied to both the 3D and 3DBB feature sets.Consequently, 80 cases (7.0%) of low-robustness features were removed from our 2D annotation, and 1055 cases (93.0%) of features with high robustness were retained.The excluded feature types are applied to both 2D and 2DBB feature sets, and the last four retained high-robustness feature sets are analyzed for subsequent analysis.
(2) After performing single-factor analysis, there are 86 remaining features in 2D (3) After using LASSO, a group of key features for 3 tasks and 4 annotation types of ROI were determined.
As shown in Table S.2, the final feature set consists of 12 features for 2D pT ; 13 features for 3D pT , 16 features for 2D pT BB , 9 features for 3D pT BB , 17 features for 2D pLNM , 14 features for 3D pLNM , 17 features for 2D pLNM BB , 10 features for 3D pLNM BB , 12 features for 2D pDFS , 13 features for 3D pDFS , 15 features for 2D pDFS BB , and 13 features for 3D pDFS BB .

Result S2: IDI analysis of the incremental value of radiomics in pT and pLNM
To verify the incremental value of the radiomics model compared with radiological assessment in pT and pLNM tasks, we used the integrated discrimination improvement

"
actCoef" to obtain the risk score and rad-score.Using the feature vectors obtained from the training set and their coefficient values, the rad-score values were calculated for each sample in the internal and independent external validation cohorts.The optimal hyperparameter λ values and coefficients for Lasso-Logistic and Lasso-Cox are determined as shown in Fig. S.1.
index (IDI) to deeply analyze the predictive gain of the radiomics model.Specifically, we established integrated models integrating rad-score and radiological assessment by Logistic regression.The receiver operating characteristic (ROC) curve was used to show the comparison of the diagnostic effect between the integrated models and the radiological assessment, as shown in Fig. S.5.Subsequently, we quantified the gain of the radiomics on the predictive ability of the models by the IDI test.As shown in Table S.3, it was observed that the IDI values were all greater than 0, indicating that the addition of the rad-score brought positive improvement to the model.The larger the IDI value, the stronger the predictive ability of the new model compared with the baseline model.The results showed that the gains of all radiomics scores constructed based on different annotation methods were statistically significant (p<0.0001).This further confirms the validity and superiority of radiomics models in predicting pathological tumour T-stage and LNM.Insights Imaging (2024) Zhu Y, Wei Y, Chen Z, et al.

Fig. S. 1
Fig. S.1The optimal hyperparameter λ values were identified through 10-fold cross-validation during the modelling of 12 radiomics features.The lowest values identified were determined to be the feature that best matched the true results.Lasso-Logistic was employed for the pT and pLNM prediction tasks, while Lasso-Cox was utilized for the pDFS task.The LASSO regression model identified radiomics features with non-zero coefficients.

Fig. S. 2
Fig. S.2 Kaplan-Meier curves based on the combined models.The p-value was calculated using a two-sided Log-rank test.The predicted tasks: pT, pLNM, and pDFS.2D: detailed annotation based on maximum tumour area level; 3D: detailed annotation based on whole tumour.2DBB: bounding box annotation based on maximum tumour area level; 3DBB: bounding box annotation based on whole tumour.

Fig. S. 3
Fig. S.3 Calibration curves of the combined models.In the three tasks, the calibration curves of the combined models all showed good calibration.The predicted tasks: pT, pLNM, and pDFS.2D: detailed annotation based on maximum tumour area level; 3D: detailed annotation based on whole tumour.2DBB: bounding box annotation based on maximum tumour area level; 3DBB: bounding box annotation based on whole tumour.

Fig. S. 4
Fig. S.4Decision curves analysis of the combined models.The x-axis represents the threshold probability, whereas the y-axis illustrates the net benefit.The decision curves indicate that all combined models generate a higher net benefit within a certain range compared to the all/no-intervention strategy.The predicted tasks: pT, pLNM, and pDFS.2D: detailed annotation based on maximum tumour area level; 3D: detailed annotation based on whole tumour.2DBB: bounding box annotation based on maximum tumour area level; 3DBB: bounding box annotation based on whole tumour.

Fig. S. 5
Fig. S.5 Receiver operating characteristic curves for integrated modelling and radiological assessment in the pT and pLNM tasks.

Table S .
1 The MRI image acquisition parameters of the two centres

Table S .2
The feature sets selected for radiomics modelling Significance comparison between integrated models and sole radiological assessment Boldface indicates statistical significance (p<0.05).